Search CORE

7 research outputs found

TAG: Boosting Text-VQA via Text-aware Visual Question-answer Generation

Author: Davis Larry S.
Gao Mingfei
Hu Yuqian
JaJa Joseph F.
Ramaiah Chetan
Selvaraju Ramprasaath R.
Wang Jun
Xu Ran
Publication venue
Publication date: 07/10/2022
Field of study

Text-VQA aims at answering questions that require understanding the textual cues in an image. Despite the great progress of existing Text-VQA methods, their performance suffers from insufficient human-labeled question-answer (QA) pairs. However, we observe that, in general, the scene text is not fully exploited in the existing datasets -- only a small portion of the text in each image participates in the annotated QA activities. This results in a huge waste of useful information. To address this deficiency, we develop a new method to generate high-quality and diverse QA pairs by explicitly utilizing the existing rich text available in the scene context of each image. Specifically, we propose, TAG, a text-aware visual question-answer generation architecture that learns to produce meaningful, and accurate QA samples using a multimodal transformer. The architecture exploits underexplored scene text information and enhances scene understanding of Text-VQA models by combining the generated QA pairs with the initial training data. Extensive experimental results on two well-known Text-VQA benchmarks (TextVQA and ST-VQA) demonstrate that our proposed TAG effectively enlarges the training data that helps improve the Text-VQA performance without extra labeling effort. Moreover, our model outperforms state-of-the-art approaches that are pre-trained with extra large-scale data. Code is available at https://github.com/HenryJunW/TAG.Comment: BMVC 202

arXiv.org e-Print Archive

Challenges in Representation Learning: A report on three machine learning contests

Author: Athanasakis Dimitris
Bengio Yoshua
Bergstra James
Carrier Pierre Luc
Chuang Zhang
Courville Aaron
Cukierski Will
Erhan Dumitru
Feng Fangxiang
Goodfellow Ian J.
Grozea Cristian
Hamner Ben
Ionescu Radu
Lee Dong-Hyun
Li Ruifan
Milakov Maxim
Mirza Mehdi
Park John
Popescu Marius
Ramaiah Chetan
Romaszko Lukasz
Shawe-Taylor John
Tang Yichuan
Thaler David
Wang Xiaojie
Xie Jingjing
Xu Bing
Zhou Yingbo
Publication venue
Publication date: 01/01/2013
Field of study

The ICML 2013 Workshop on Challenges in Representation Learning focused on three challenges: the black box learning challenge, the facial expression recognition challenge, and the multimodal learning challenge. We describe the datasets created for these challenges and summarize the results of the competitions. We provide suggestions for organizers of future challenges and some comments on what kind of knowledge can be gained from machine learning competitions.Comment: 8 pages, 2 figure

arXiv.org e-Print Archive

Crossref

Fraunhofer-ePrints

A sigma-lognormal model for handwritten text CAPTCHA generation

Author: Govindaraju Venu
Plamondon Réjean
Ramaiah Chetan
Publication venue: Institute of Electrical and Electronics Engineers
Publication date: 01/01/2014
Field of study

PolyPublie

Coupled IGMM-GANs with Applications to Anomaly Detection in Human Mobility Data

Author: LeCun Yann
Molano-Mazon Manuel
Ramaiah Chetan
van der Maaten Laurens
Zhou Fan
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

IoT Based Acetone Level Monitoring Using Non-Invasive Method for Diabetic Patients

Author: Akansha Saxena
C Wang
Chetan Sharma
M Lieschnegg
M. J. Sharanya
Narayana Swamy Ramaiah
Of Total
Publication venue: 'Elsevier BV'
Publication date: 01/01/2019
Field of study

Crossref